Factors

Factors are variables which take on a limited number of values, aka categorical variables. In R, factors are stored as a vector of integer values with the corresponding set of character values you’ll see when displayed (colloquially, labels; in R, levels).

parcels %>% count(use_code_reduced) # currently a character

parcels %>% 
  mutate(use_code_reduced = factor(use_code_reduced)) %>% # make a factor
  count(use_code_reduced)

# assert the ordering of the factor levels
use_levels <- c("Apartment", "Du-Tri-Quadplex", "Condominium", "Single Family")

parcels %>% 
  mutate(use_code_reduced = factor(use_code_reduced, levels = use_levels)) %>% 
  count(use_code_reduced)

The forcats package, part of the tidyverse, provides helper functions for working with factors. Including

  • fct_infreq(): reorder factor levels by frequency of levels
  • fct_reorder(): reorder factor levels by another variable
  • fct_relevel(): change order of factor levels by hand
  • fct_recode(): change factor levels by hand
  • fct_collapse(): collapse factor levels into defined groups
  • fct_lump(): collapse least/most frequent levels of factor into “other”

Joins

Joins merge data sets based on key variables. The syntax is always name_join(x, y, by = "key")

Animated visuals created by Garrick Aden-Buie

  • full_join(): keeps all observations in x and y

  • left_join(): keeps all observations in x

  • right_join(): keeps all observations in y

  • inner_join(): keeps observations in both x and y

Separate/Unite

separate: Split a single column into multiple columns by separating each cell in the column into a row of cells.

separate(df, col = rate, into = c("cases", "pop"), sep = "/")

unite: Combine several columns into a single column by uniting their values across rows.

unite(df, col = year, century:year, sep = "")

Pivot

pivot_longer: Convert wide data to long, or move variable values out of the column names and into the cells.

pivot_longer(df, cols = -country, names_to = "year", values_to = "cases")

pivot_wider: Convert long data to wide, or move variable names out of the cells and into the column names.

pivot_wider(df, id_cols = country, names_from = type, values_from = count)

Let’s Play with R!

Go to slack and copy the practice script for today (learningRweek9.R). Then open an RStudio session using the learningRweek9.Rproj file.